SHREC: a short-read error correction method
نویسندگان
چکیده
MOTIVATION Second-generation sequencing technologies produce a massive amount of short reads in a single experiment. However, sequencing errors can cause major problems when using this approach for de novo sequencing applications. Moreover, existing error correction methods have been designed and optimized for shotgun sequencing. Therefore, there is an urgent need for the design of fast and accurate computational methods and tools for error correction of large amounts of short read data. RESULTS We present SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure. Our results show that the method can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as well as for real data. Furthermore, it achieves an error correction accuracy of over 80% for simulated data and over 88% for real data. These results are clearly superior to previously published approaches. SHREC is available as an efficient open-source Java implementation that allows processing of 10 million of short reads on a standard workstation.
منابع مشابه
Correction of sequencing errors in a mixed set of reads
MOTIVATION High-throughput sequencing technologies produce large sets of short reads that may contain errors. These sequencing errors make de novo assembly challenging. Error correction aims to reduce the error rate prior assembly. Many de novo sequencing projects use reads from several sequencing technologies to get the benefits of all used technologies and to alleviate their shortcomings. How...
متن کاملModellierung und Alignment von Genom-Sequenzdaten
This dissertation examines the main phases of a sequencing process. It shows how these phases can be modeled, optimized, or created differently to produce a short-read alignment that exhibits higher performance and quality. To achieve this result, a model of short reads is deduced. With this model it is possible to derive formulas that allow good estimations of the result of a sequencing proces...
متن کاملImproved long read correction for de novo assembly using an FM-index
Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and t...
متن کاملReptile: representative tiling for short read error correction
MOTIVATION Error correction is critical to the success of next-generation sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for high-throughput short-read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers of short reads with existing error co...
متن کاملThe Relationship between Education and Health: Vector Error Correction Model (VECM)
Background & objectives: Despite the importance and impact of health and education on economic growth in countries, the causal relationship between education and health is important to policymaking. This study aimed to investigate the causality relationship between education and health in the short and long runs using the Vector Error Correction Model (VECM) in Iran. Method: This was an analyt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 25 17 شماره
صفحات -
تاریخ انتشار 2009